Contrastive Perplexity: A new evaluation metric for sentence level language models

نویسنده

  • Kushal Arora
چکیده

Perplexity(per word) is the most widely used metric for evaluating language models. This is mostly due to a its ease of computation, lack of dependence on external tools like speech recognition pipeline and a good theoretical justification for why it should work. Despite this, there has been no dearth of criticism for this metric. Most of this criticism center around lack of correlation with extrinsic metrics like word error rate(WER), dependence upon shared vocabulary for model comparison and unsuitability for un-normalized language model evaluation. In this paper we address the last problem of inability to evaluate un-normalized models by introducing a new discriminative evaluation metric that predicts model’s performance based on its ability to discriminate between test sentences and their deformed version. Due to its discriminative formulation, this approach can work with un-normalized probabilities while retaining perplexity’s ease of computation. We show a strong correlation between our new metric and perplexity across a range of models on WSJ datasets. We also hypothesize a stronger correlation between WER and our new metric vis-a-vis perplexity due to similar discriminative objective.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Compositional Approach to Language Modeling

Traditional language models treat language as a finite state automaton on a probability space over words. This is a very strong assumption when modeling something inherently complex such as language. In this paper, we challenge this by showing how the linear chain assumption inherent in previous work can be translated into a sequential composition tree. We then propose a new model that marginal...

متن کامل

Generating Sentences by Editing Prototypes

We propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-toright or by first sampling a latent sentence vector, our prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to ...

متن کامل

Segment Choice Models: Feature-Rich Models for Global Distortion in Statistical Machine Translation

This paper presents a new approach to distortion (phrase reordering) in phrasebased machine translation (MT). Distortion is modeled as a sequence of choices during translation. The approach yields trainable, probabilistic distortion models that are global: they assign a probability to each possible phrase reordering. These “segment choice” models (SCMs) can be trained on “segment-aligned” sente...

متن کامل

Sentence-level MT evaluation without reference translations: Beyond language modeling

In this paper we investigate the possibility of evaluating MT quality and fluency at the sentence level in the absence of reference translations. We measure the correlation between automatically-generated scores and human judgments, and we evaluate the performance of our system when used as a classifier for identifying highly dysfluent and illformed sentences. We show that we can substantially ...

متن کامل

Neural Lattice Language Models

In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1601.00248  شماره 

صفحات  -

تاریخ انتشار 2016